A Generalized Additive Model for Temperature Forecasting in the United States

Conor Olive

Methods

Observations

  • Observations are obtained from Castro (2023) and contain U.S. weather stations and SNOTEL stations
  • Data spans the period from 1 January 2017 to 21 September 2017
Figure 1: Inverse distance weighted interpolation of temp_avg from each station for each day in the data set.

Predictors and Outcome

Our interest is in creating a model from the available data using

  • latitude
  • longitude
  • elevation
  • date

as predictors of temp_avg.

Data Cleaning

  • Since date is encoded as a string, we convert it to a date object and then the number of days since 1 January 2017
  • We also restrict our data to the lower 48 U.S. states and the District of Columbia. To accomplish this, we must
    • Convert our data.frame into a geospatial sf data frame object
    • Load shapefiles for the area of interest
    • Exclude observations spatially located outside of the shapefile
    • Convert the sf back into a data.frame object
  • The resulting data to which we may properly fit our model
day elevation temp_avg longitude latitude
0 110.0 53.24 -88.7714 34.2622
0 149.0 44.60 -119.0542 35.4344
0 1287.8 22.46 -111.9694 40.7781
0 1261.9 0.50 -118.9564 43.5950
0 1364.9 14.90 -112.5711 42.9203
0 240.5 62.06 -98.4839 29.5442
0 95.1 35.42 -73.8092 42.7431
0 246.9 39.38 -82.8808 39.9914
0 360.6 14.18 -93.3981 48.5614
  • Before continuing, it is worth noticing the spatial distribution of the observations in Figure 2. Specifically, there is a much higher distribution of observations in the inter-mountain west, which has implications for our model. These will be discussed in more detail later.

Figure 2: Spatial distribution of the observations within the data set.

  • We may plot the data in two dimensions at a time and see what kind of trends, if any, exist in the data. First, looking at the effect of latitude on temp_avg

Figure 3: Average daily temperature versus time for all days.

Figure 4: Average daily temperature versus elevation for all days.

Figure 5: Average daily temperature versus latitude for all days.

Figure 6: Average daily temperature versus longitude for all days.

Elevation and the Ideal Gas Law

  • Based from physical understanding of the weather, the correlation both latitude and elevation have are expected
  • First, atmospheric pressure decreases in proportion to elevation. In turn, Gay-Lussac’s law informs us that \[P \propto T\]
  • Therefore, we transitively expect temperature to decrease in proportion to elevation, all else being equal

Solar Irradiance and Latitude

  • Irradiance is the primary forcing effect in surface temperature
  • Decreases in intensity with distance from the solar equator due to the angle of incidence at which solar rays shine on the Earth.

Demontration of the relationship between angle of incidence and intensity of solar irradiance.

  • Ignoring the effect of the atmosphere on solar irradiance, geometric reasoning tells us that solar itensity \(I\) at a given latitude, where \(I_0\) is the intensity at the equator, is \[ I = I_0 \cos{\theta}.\]
  • However, in our data, where our latitude varies from approximately 25 to 50 degrees, this function \(I\) is roughly linear, as shown in Figure 7.

Figure 7: Theoretical irradiance versus latitude in the lower 48 states.

Procedure

  • Owing to the fact that this model is a combination of both linear and highly non-linear physical effects, we choose to fit the data set with a Generalized Additive Model, implemented by Hastie (2023).
  • Our particular model is

\[\text{temp_avg} = \beta_0 + \beta_1 (\text{elevation}) + \beta_2 (\text{latitude}) + \beta_3 (\text{longitude}) + \beta_4f(\text{days}).\] since we hypothesize the partial derivatives of temp_avg with respect to longitude, latitude and elevation to be roughly constant.

Results

  • Finally, we fit our model
  • The resulting coefficients \(\beta_i\) in our gam_weather model are
               gam_weather.coefficients
(Intercept)                80.568474227
elevation                  -0.006978337
longitude                  -0.065433432
latitude                   -1.192853051
s(day, df = 5)              0.168449564
  • Using our testing set of data, we calculate the root mean square deviation
# Use the model to make prediction based on our training set 
gam_predict <- predict(gam_weather, test_x)

# Calculate RMSE on the residuals of our prediction and training set avg_temp
gam_rmse <- sqrt(mean((gam_predict - test_y)^2))
  • We get a root mean square deviation (RMSD) of 7.834574
  • Partial residual plots for each of the independent predictors are shown below in Figure 8.

Figure 8: Partial residuals and components of the model.

Conclusions

  • We aimed to forecast temperature in the US from January 1, 2017, to September 21, 2017 using latitude, longitude, elevation, and date
  • Fitting a Generalized Additive Model to the data yieled acceptable results with a RMSE of 7.834574 and captured several predicted trends
  • The resulting model is a reasonably good predictor of temperature within the time frame of the data, especially within the inter-mountain west, but poorer outside this region
Figure 9: Temperature residuals by station over time.

References

Castro, Spencer. 2023. “Weather Data.” https://catcourses.ucmerced.edu/files/6731022/download?download_frd=1.
Gräler, Benedikt, Edzer Pebesma, and Gerard Heuvelink. 2016. “Spatio-Temporal Interpolation Using Gstat.” The R Journal 8: 204–18. https://journal.r-project.org/archive/2016/RJ-2016-014/index.html.
Hastie, Trevor. 2023. Gam: Generalized Additive Models.
Hijmans, Robert J. 2023. Raster: Geographic Data Analysis and Modeling. https://rspatial.org/raster.
Pebesma, Edzer. 2018. Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.